data capture from web books data entry india author html conversion publisher book novel india authors data extraction from web keyboarding